Generic multiset programming with discrimination-based joins and symbolic Cartesian products

نویسندگان

  • Fritz Henglein
  • Ken Friis Larsen
چکیده

This paper presents GMP, a library for generic, SQL-style programming with multisets. It generalizes the querying core of SQL in a number of ways: Multisets may contain elements of arbitrary first-order data types, including references (pointers), recursive data types and nested multisets; it contains an expressive embedded domain-specific language for specifying user-definable equivalence and ordering relations, extending the built-in equality and inequality predicates; it admits mapping arbitrary functions over multisets, not just projections; it supports user-defined predicates in selections; and it allows user-defined aggregation functions. Most significantly, it avoids many cases of asymptotically inefficient nested iteration through Cartesian products that occur in a straightforward stream-based implementation of multisets. It accomplishes this by employing two novel techniques: symbolic (term) representations of multisets, specifically for Cartesian products, for facilitating dynamic symbolic computation, which intersperses algebraic simplification steps with conventional data processing; and discrimination-based joins, a generic technique for computing equijoins based on equivalence discriminators, as an alternative to hash-based and sort-merge joins. Full source code for GMP in Haskell, which is based on generic top-down discrimination (not included), is included for experimentation. We provide illustrative examples whose performance indicates that GMP, even without requisite algorithm and data structure engineering, is a realistic alternative to SQL even for SQL-expressible queries.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Optimizing relational algebra operations using generic partitioning discriminators and lazy products∗

We show how to implement in-memory execution of the core relational algebra operations of projection, selection and cross-product efficiently, using discrimination-based joins and lazy products. We introduce the notion of (partitioning) discriminator, which partitions a list of values according to a specified equivalence relation on keys the values are associated with. We show how discriminator...

متن کامل

Optimizing Inequality Joins in Datalog with Approximated Constraint Propagation

Datalog systems evaluate joins over arithmetic (in)equalities as a naive generate-and-test of Cartesian products. We exploit aggregates in a source-to-source transformation to reduce the size of Cartesian products and to improve performance. Our approach approximates the well-known propagation technique from Constraint Programming. Experimental evaluation shows good run time speed-ups on a rang...

متن کامل

Multiset Discrimination − a Method for Implementing Programming Language Systems Without Hashing

It is generally assumed that hashing is essential to many algorithms related to efficient compilation; e.g., symbol table formation and maintenance, grammar manipulation, basic block optimization, and global optimization. This paper questions this assumption, and initiates development of an efficient alternative compiler methodology without hashing or sorting. Underlying this methodology are se...

متن کامل

Higher-Order Containers

Containers are a semantic way to talk about strictly positive types. In previous work it was shown that containers are closed under various constructions including products, coproducts, initial algebras and terminal coalgebras. In the present paper we show that, surprisingly, the category of containers is cartesian closed, giving rise to a full cartesian closed subcategory of endofunctors. The ...

متن کامل

An Algebraic Approach to Stable Domains

Day [75] showed that the category of continuous lattices and maps which preserve directed joins and arbitrary meets is the category of algebras for a monad over Set, Sp or Pos, the free functor being the set of filters of open sets. Separately, Berry [78] constructed a cartesian closed category whose morphisms preserve directed joins and connected meets, whilst Diers [79] considered similar fun...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Higher-Order and Symbolic Computation

دوره 23  شماره 

صفحات  -

تاریخ انتشار 2010